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Abstract. Guidelines and consistency rules of UML are used to con- 
trol the degrees of freedom provided by the language to prevent faults. 
Guidelines are used in specific domains (e.g., avionics) to recommend 
the proper use of technologies. Consistency rules are used to deal with 
inconsistencies in models. However, guidelines and consistency rules use 
informal restrictions on the uses of languages, which makes checking diffi- 
cult. In this paper, we consider these problems from a language-theoretic 
view. We propose the formalism of C-Systems, short for "formal language 
control systems". A C-System consists of a controlled grammar and a 
controlling grammar. Guidelines and consistency rules are formalized as 
controlling grammars that control the uses of UML, i.e. the derivations 
using the grammar of UML. This approach can be implemented as a 
parser, which can automatically verify the rules on a UML user model 
in XMI format. A comparison to related work shows our contribution: a 
generic top-down and syntax-based approach that checks language level 
constraints at compile-time. 



1 Introduction 

The UML (Unified Modeling Language) is a graphic modeling language devel- 
oped by OMG (Object Management Group), and defined by the specifications 
[T] and [2]. UML has emerged as the software industry's dominant modeling lan- 
guage for specifying, designing and documenting the artifacts of systems [3] [4]. 

Evolving descriptions of software artifacts are frequently inconsistent, and 
tolerating this inconsistency is important [5][B]. Different developers construct 
and update these descriptions at different times during development [7], thus 
resulting in inconsistencies. They develop multiple views on a system providing 
pieces of information which are redundant or complementary. Constraints exist 
on these pieces of information whose violation leads to inconsistent models. In- 
consistency problems of UML models have attracted great attention from both 
academic and industrial communities [HUH] [ID]- A list of 635 consistency rules 
are identified by [11] [12] . 
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Guidelines, which also contain a set of rules, are often required on models 
which are specific to a given context. For instance, OOTiA (Object-Oriented 
Technology in Aviation) demands that "the length of an inheritance should be 
less than 6" [T3]. This context is domain specific. If these constraints are not 
respected, the presence of faults is not sure but its risk is high. The context 
can also be technology specific. For instance, "multiple inheritance should be 
avoided in safety critical, certified systems" (IL #38 of [13]), if the UML models 
are implemented by Java code, as this language does not provide the multiple 
inheritance mechanism. 

It seems that consistency rules and guidelines arc irrelevant at first glance. 
However, in fact, they have the same origin from a language-theoretical view. 
We noticed that both of the two types of potential faults in models come from 
the degrees of freedom offered by languages. These degrees of freedom cannot 
be eliminated without reducing the language capabilities [2]. For instance, the 
multiple diagrams in UML are useful, as they describe various viewpoints on one 
system, even if they are at the origin of numerous inconsistencies. In the same 
way, multiple inheritance can be implemented in the C++ language. 

To prevent these risks of faults, the use of languages must be controlled. To do 
it, guidelines are old and popular means in industry. However, their expression 
is informal and their checking is difficult. For instance, 6 months were needed 
to check 350 consistency rules on an avionics UML model including 116 class 
diagrams. 

This paper aims at formalizing the acceptable use of languages and proposing 
a way to check the use correctness, by considering guidelines and consistency 
rules from a language-theoretical view. To achieve this goal, acceptable uses of a 
language are defined as a grammar handling the productions of the grammar of 
the language. To support this idea, UML must be specified by a formal language, 
or at least a language with precisely defined syntax, e.g., XMI in this paper. Thus, 
a graphic model can be serialized. This formalism also provides a deeper view 
on the origin of inconsistencies in models. 

This paper is organized as follows. First, we introduce the grammar of UML 
in XMI in Section [2j Then in Section [3] we define the C-System, i.e. a formal- 
ism containing controlling grammars that restrict the use of the grammar of 
UML. We illustrate the formalism using examples in Section HI Related work 
and implementation of this approach are discussed in Sections |5] and [6] Section 
[7] concludes the paper. 

2 The Grammar of UML in XMI 

XMI (XML Metadata Interchange) [15] is used to facilitate interchanging UML 
models between different modeling tools in XML format. Many tools implement 
the conversion, e.g., Altova UModel® can export UML models as XMI files. 

A UML model in XMI is an XMI-compliant XML document that conforms 
to its XML schema, and is a derivative of the XMI document productions which 
is defined as a grammar. The XML schema is a derivative of the XMI schema 
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productions. The XMI specification defines both the XMI schema productions 
and the XMI document productions in |15j . 

XMI provides a mapping between a UML user model and an XML docu- 
ment, and a mapping between UML (also MOF) and an XML Schema. XMI 
generates an XML file using the XMI document productions, and generates an 
XML schema using the XMI schema productions. Each of the two sets of pro- 
ductions composes a context-free grammar in EBNF [T5]. A UML user model 
can be expressed using an XMI-compliant XML document that conforms to the 
corresponding XML Schema, and is a derivative of the XMI document grammar. 

The grammar and its productions for deriving XMI-compliant XML docu- 
ments of UML models are defined in [15] . The main part of the grammar is 
given here after. To make our presentation more concise, we omit declaration 
and version information of XML files (and the related productions whose names 
start with "1"). 

To make later reasoning easier, we modified some representations of the pro- 
ductions, but without changing the generative power of the grammar. 

1. The choice operator "|" is used to compose several productions with the 
same left-hand side into a single line in [TS]. We decomposed some of these pro- 
ductions into several productions without the choice operator. An original pro- 
duction n having k choices might be divided into a set of productions {n_i}i<i<fc. 
For example, the original production 2 with three choices was divided into the 
productions 2_1, 2_2 and 2_3. 

2. The closure operator "*" is used to simplify the representation of the gram- 
mar in [15] . but it also would make the representation of reasoning confusing. 
Thus, the productions whose names start with "3" were added to replace the 
productions with closure operators. 

The grammar G of UML in XMI includes the following productions (each 
production is labeled by a name starting with a digit): 



3_1: 


XMIElements : := 2 


XMIElement 


3_2: 


XMIElements : := 2 


XMIElement 3 : XMIElements 


2_1: 


XMIElement : := 2a 


XMIObjectElement 


2_2: 


XMIElement : : = 2b 


XMIValueElement 


2_3: 


XMIElement : := 2c 


XMIRef erenceElement 


2a_l: 


XMIObjectElement 


:= "<" 2k:QName 2d:XMIAttributes "/> 


2a_2: 


XMIObjectElement 


:= "<" 2k:QName 2d:XMIAttributes ">" 



3: XMIElements "</" 2k:QName ">" 

2b_l: XMIValueElement ::= "<" xmiName ">" value "</" xmiName ">" 
2b_2: XMIValueElement ::= "<" xmiName "nil='true'/>" 

2c_l: XMIRef erenceElement: := "<" xmiName 21 :LinkAttribs "/>" 
2c_2: XMIRef erenceElement: := "<" xmiName 2g:TypeAttrib 

21:LinkAttribs "/>" 
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2d_l: XMIAttributes : := 2g:TypeAttrib 2e : IdentityAttribs 

3h : FeatureAttribs 
2d_2: XMIAttributes ::= 2e : IdentityAttribs 3h: FeatureAttribs 

2e: IdentityAttribs ::= 2f : IdAttribName "='" id " '" 

2f_l: IdAttribName ::= "xmi:id" 

2f_2: IdAttribName ::= xmildAttribName 

2g: TypeAttrib ::= "xmi:type=' " 2k:QName " '" 

3h_l: FeatureAttribs ::= 2h:FeatureAttrib 

3h_2: FeatureAttribs ::= 2h:FeatureAttrib 3h: FeatureAttribs 

2h_l: FeatureAttrib ::= 2i : XMIValueAttribute 
2h_2: FeatureAttrib ::= 2j : XMIRef erenceAttribute 

2i: XMIValueAttribute ::= xmiName "='" value '"" 

2j : XMIRef erenceAttribute ::= xmiName "="' (refld I 2n:URIref )+" ' 

2k: QName ::= "uml:" xmiName I xmiName 

21: LinkAttribs ::= "xmi : idref =' " refld " '" I 2m:Link 

2m: Link ::= "href="' 2n:URIref 

2n: URIref ::= (2k:QName)? uriReference 

In the grammar, the symbol "::=" stands for the conventional rewriting sym- 
bol "— >" in formal languages theory p~7]. Each nonterminal starts with a capital 
letter, prefixing a label of the related production, e.g., "2:XMIElcmcnt" is a non- 
terminal with possible productions "2_1, 2_2, 2_3". Each terminal starts with a 
lowercase letter or is quoted. 

As an example to illustrate the use of the grammar, Figure [T] represents a 
package Root which includes three classes, where the class FaxMachine is derived 
from Scanner and Printer. The core part of the exported XMI 2.1 compliant file 
(using Altova UModel®) is as follows: 

<uml : Package xmi : id="U00000001-7510-lld9-86f 2-000476a22f 44" 
name="Root"> 
<packagedElement xmi : type="uml : Class" 

xmi:id="U572b4953-ad35-496f-af6f-f2f048cl63bl" 
name=" Scanner" visibility=" public "> 



S caimer 



Printer 



sid 



£1 P 



id 



Faxlachine 



[x>10] m 



Fig. 1. A Class Diagram 



Fig. 2. An Activity Diagram 



■CownedAt tribute xmi :type="uml: Property" 

xmi : id="U46ec6e01-5510-43a2-80e9-89d9b780a60b" 
name="sid" visibility="protected"/> 
</packagedElement> 

<packagedElement xmi : type="uml : Class" 

xmi : id="Ua9bd8252-0742-4b3e-9b4b-07a95f 7d242e" 
name= "Printer" visibility=" public "> 
<ownedAttribute xmi :type="uml: Property" 

xmi : id="U2ce0e4c8-88ee-445b-8169-f 4c483ab9160" 
name="pid" visibility="protected"/> 
</packagedElement> 

<packagedElement xmi : type="uml : Class" 

xmi : id="U6dealea0-81d2-4b9c-aab7-a830765169f 0" 
name="FaxMachine" vis ibility= "public "> 
generalization xmi : type="uml : Generalization" 

xmi : id="U3b334927-5573-40cd-a82b-lee065ada72c" 
general="U572b4953-ad35-496f-af6f-f2f048cl63bl"/> 
generalization xmi : type="uml : Generalization" 

xmi : id="U86a6818b-f 7e7-42d9-a21b-c0e639a4f 716" 
general="Ua9bd8252-0742-4b3e-9b4b-07a95f7d242e"/> 
</packagedElement> 
</uml : Package> 

This text is a derivative of the XMI document productions, c.f. the previous 
grammar G. We may use the sequence of productions "2a_2, 2k(Package), 2d_2, 
2e, 2f_l, 3h_l, 2h_l, 2i" to derive the following sentential form: 



<uml : Package xmi : id="U00000001-7510-lld9-86f 2-000476a22f 44 
name="Root"> 
3:XMIElements "</" 2k:QName ">" 



() 



Note that the production 2k has a parameter xmiName, i.e. the value of 
the terminal when apply the production. In a derivation, we specify a value of 
the parameter as "2k(value)". For example, "2k(Package)" is a derivation using 
2k with xmiName = "Package" . For simplicity, we consider "2k(value)" as a 
terminal as a whole. We continue to apply productions, and finally derive the 
XMI file previously presented. 

Notice that the model of Fig. [I] (both in UML and XML) does not con- 
form to the guidelines in OOTiA about multiple inheritance, since it uses multi- 
inheritance. The model of Fig. [2] has an inconsistency: "the number of outgo- 
ing edges of ForkNode is not the same as the number of incoming edges of 
JoinNode" . In particular, JoinNode joins two outgoing edges from the same 
DecicionNode, This join transition will never be activated, since only one of the 
two outgoing edges will be fired. 

We will define a formal model to check the conformance to these rules by 
controlling the use of the grammar of UML. 

3 The C-System: A Formal Language Control System 

In this section, we propose the formal model for controlling the use of grammars 
based on classical language theory [TT] . 

Let G = (N, T, P, S) be a grammar, where N is the set of nonterminals, T is 
the set of terminals, P is the set of productions of the form I : A — > a where I is 
the name of the production, A € N, a E (N{JT)*, and S is the start symbol. 
A derivation using a specified production p is denoted by a 4> /3, and multiple 
derivations are denoted by a 7. 

Definition 1. A controlling grammar G over a controlled grammar (or 

simply grammar) G = (N, T, P, S) is a quadruple G — (N, T, P, S), where T = 
P. The language L(G) is called a controlling language. □ 

The symbol G is read "control G" or "G hat" . For making reading easier, we 
assume that iVniV = 0. T = P means that the terminals of G are exactly the 
productions of G. 

If we use an automaton A to process the input string, such that L(A) = L(G), 
then we can also use a controlling automaton A to represent the controlling 
language. 

As we know, each string w € L{G) has at least one leftmost derivation (de- 
noted by "Im") using a sequence of productions from P, e.g. piP2---Pk- The 
controlling grammar restricts the derivation in the sense that the sequences of 
applied productions should be in the language it specifies, i.e., p\Pi---Pk £ L{G). 
Formally, we have the following definition. 

Definition 2. Given a grammar G = (N, T, P, S), the language of the grammar 
with a controlling grammar G is: 

L(G^G) = {w\S wi ■ ■ ■ uik = w, pi,P2, —,Pk e P and pip 2 —Pk G L(G)} 

Im Im 
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We say that G and G constitute a C-System C — G ■ G, short for formal 
language control system. The language L(C) = L(G~^G) is called a global 
system language. □ 

The symbol ~^ is called "meta composition" . Its left operand is controlled 
by the right operand, which is a meta level grammar. If we use automata-based 
notations, a string w € L(A ■ A) if and only if A accepts w, and A accepts the 
sequence of the labels of the transitions used. 

A regular C-System is a C-System of which the controlled grammar G 
is a regular grammar (or A is a finite automaton). Some variants of regu- 
lar C-Systems are proposed for ensuring system safety requirements, e.g. In- 
put/Output C-Systems [IS], Interface C-Systems [15] {20]. We denote by C R the 
family of regular C-Systems. 

A context-free C-System is a C-System of which the controlled grammar 
G is a context-free grammar (or A is a pushdown automaton). We denote by 
Ccf the family of context-free C-Systems. 

Generally, we denote by C\ the family of C-Systems that consist of X-type 
controlled grammar and K-type controlling grammar, where X, Y € {R, CF}. 
Although X, Y could be also other types in Chomsky hierarchy, e.g. context- 
sensitive, this is beyond the scope of this paper. 

Obviously, the set of accepted inputs is a subset of the controlled language, 
such that the sequence of the applied productions belongs to the controlling 
language. Consider a simple example as follows. 

Example 1 . Given a regular grammar G and a regular controlling grammar G: 

S —>■ aS 




S^bS G 



S -> piS\pzS\p 2 A 
A -> p 2 A\p 3 A\e 



L{G) accepts the language (a\b)* , e.g., aab, abab. L(G) accepts the language 
(pi\P3)* P2(P2\P3)* ■ The trivial grammar G is considered to provide a simple 
illustration of the introduced principles. 

The grammars G and G constitute a regular C-System C = G^G <E Cj| . 

Given the string aab £ L(G), we conclude that aab £ L(G~?G), because we 

• • fi Pi Pi P2 P3 

have the leftmost derivations S =?■ aS =$■ aaS =$■ aabS =$■ aab, where P1P1P2P3 £ 
L(G) as S piS =*> P1P1S P1P1P2A => P1P1P2P3A =*> P\P\PiP3- On the 
contrary, we have abab L(G~^G). Although we have the leftmost derivation 
S ^i- aS 5- abS =4- abaS ^ ababS abab, P1P2P1P2P3 & L(G). 

In fact, the language L(C) — L(G^G) is equivalent to the language a*b + , 
which is the subset of (a\b)* satisfying the constraints: "every a should appear 
before 6" and "at least one b" . □ 

We remark here that our model is different from regularly controlled gram- 
mars [H]IH]i m the sense that we restrict derivations to be leftmost and allow 
context-free controlling grammars. These differences result in different theoretical 
results, which are beyond the scope of this paper. 
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4 Examples 

In this section we use some practical examples to illustrate the idea of the pre- 
vious section. We denote the grammar of UML by G = (N, T, P, S), where P is 
the set of productions listed in Section [2] and each production p e P is labeled 
by a name starting with a digit. 

Example 2. Consider two rules on class diagrams: 

Rule 1: Each class can have at most one generalization. This rule is a guide- 
line, as we mentioned in Section 1 and at the end of Section 2. This rule is also 
a consistency rule in the context of Java, since Java does not allow multiple 
inheritance. However we may derive a class from multiple classes in the context 
of C++. 

Rule 2: Each class can have at most 30 attributes. This rule may be adopted 
by software authorities as a guideline in avionics, in order to increase the safety 
of software systems by minimizing the complexity of classes. 

Note that these rules cannot be explicitly integrated into the grammar of 
UML, but only recommended as guidelines or consistency rules. We cannot put 
rule 1 into the standard of UML, since UML models can be implemented with 
both CH — I- and Java programming languages. Rule 2 is a restriction for a specific 
domain, and we should not require all programmers to use limited attributes by 
specifying the UML language. 

We aim to specify the rules from the meta-language level, thus control the 
use of the language. Consider the example of Fig. [l] to obtain the associated 
XMI text, the sequence of applied productions of G in the leftmost derivation is 
as follow ("..." stands for some omitted productions, to save space): 

2a_2, 2k(Package), 2d_2, 2e, 2f_l, 3h_l, 2h_l, 2i, 
. . . , 2k(packagedElement) , . . . , 2k(Class) , 

. . . , 2k(ownedAttribute) , . . . , 2k(Property) , 
. . . , 2k(packagedElement) , 
. . . , 2k(packagedElement) , . . . , 2k(Class) , 

. . . , 2k(ownedAttribute) , . . . , 2k(Property) , 
. . . , 2k(packagedElement) , 
. . . , 2k(packagedElement) , . . . , 2k(Class) , 

. . . , 2k (generalization) , . . . , 2k (Generalization) , 

. . . , 2k (generalization) , . . . , 2k (Generalization) , 
. . . , 2k(packagedElement) , 
. . . , 2k(Package) 

Let c, g stand for 2k(Class) ,2k(Generalization) , respectively. Note that the 
occurrence of two g after the third c violates Rule 1. In fact, all the sequences 
of productions in the pattern u ...c...g...g..." are not allowed by the rule (there is 
no c between the two g), indicating that the class has two generalizations. 
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Thus, we propose the following controlling grammar G c to restrict the use of 
the language to satisfy Rule 1: 

S->c Q C \D S\D 
c .Qc^c Q c \g Q g \DQ c \D 
'' Q g ^ c Q c \ D Q g \ D 

D^{p\peP A p?{c,g}} 

where S,Q c ,Q gi D are nonterminals, D includes all productions except c, g. 
L(G C ) accepts the sequences of productions satisfying Rule 1. 

Implicitly, the controlling grammar specifies an automaton A c in Fig. [3] 
where d is an implicit error state (the dashed circle). Strings of the pattern 
D*cD* gD* gD* will lead A c to the error state. 




If the sequence of productions applied to derive a model is accepted by the 
language L(G C ), then the model conforms to Rule 1. In Fig.Jl] the derivation of 
the class FaxMachine uses the pattern D* cD* gD* gD* £ L(G C ), which leads 
to I of the automaton, thus it violates Rule 1. On the contrary, the derivations 
of Scanner and Printer are accepted by L{G C ), thus satisfy Rule 1. 

Now consider Rule 2. Let c,pr,pe stand for 2k(Class), 2k(Property), 
2k(PackagedElement) , respectively. Note that the occurrence of more than 30 
pr after a c violates Rule 2. In fact, all the sequences of productions in the 
pattern u ...c...(pr...) n , n > 30" are not allowed by the rule (there is no c between 
any two pr), indicating that the class has more than 30 attributes. 

To satisfy Rule 2, we propose the following controlling grammar G p to restrict 
the use of the language: 



gA 



S^pe S\c Q C \D S\D 
Q c ^peS\c Q c \pr Q 1 \ D Q c \ D 
Q l ^peS\c Q c \pr Q l+1 \DQ t \D (1 < % < 30) 
330 -* pe S I c Q c I D Q 30 I D 
D — * {p I p € P A p $ {c,pr,pe}} 



(2) 
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where S,Q c ,Qi are nonterminals, D includes all productions except c,pr,pe. 
L(G p ) accepts the sequences of productions satisfying Rule 2. 

Implicitly, the controlling grammar specifies an automaton A p in Fig. [4] 
Strings of the pattern u D*cD*(pr D*) n , n > 30" will lead A p to the error state. 



pe, D c,D D D D 




" - - ^pr , ~ pr _ „ - ' 

~~ v ^ V 

Fig. 4. The Automaton A p 



If the sequence of productions applied to derive a model is accepted by the 
language L(G p ), then the model conforms to Rule 2. In Fig. [l] the derivations 
of the classes Scanner and Printer use the pattern D*cD*prD* £ L(G P ), thus 
satisfy Rule 2. 

Thanks to the controlling grammars, when a model violates required rules, 
the controlling language will reject the model (an implicit error state d will be 
activated). Some error handling method may be called to process the error, e.g., 
printing an error message indicating the position and the cause. 

We can also use controlling grammar to handle a consistency rule concerning 
activity diagrams. 

Example 3. In an activity diagram, the number of outgoing edges of ForkNode 
should be the same as the number of incoming edges of its pairwise JoinNode. 

Let n, f,j, i, o stand for 2k(node), 2k(ForkNode), 2k(J oinN ode) , 2k(incoming) , 
2k{outgoing) , respectively. We propose the following controlling grammar G a to 
restrict the use of the language to satisfy the rule: 



G a { 



' s- 


-> N F 


I* Q 


0* N | N 


I* 0* N\ D* 


Q- 


Q 


I | N 


S N J 




N - 


■* n D* 








F - 


+ / D* 








' J 


+ j D* 








I - 


-> i D* 








- 


^0 D* 








,D- 


-> {P 1 P 


e p a 


P & i n J,3, 


i,o}} 



(3) 
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L(G a ) accepts all the sequences of productions of the pattern 
NFI*O n NSNJI n O*N, which leads to models respecting the rule. This context- 
free grammar implicitly specifies a PDA (Pushdown Automaton |17j). which is 
more complex than the automata in Figures [3] and |4j 

Globally, any UML user model M derived from the C-System C = G^G a E 
Cqp, i.e. M € L(C), conforms to the rule in Example [3] 

As a more concrete instance, we consider the model in Fig. [2] The XMI- 
compliant document of the model in Fig. [2] is the follows: 

<packagedElement xmi : type="uml : Activity" 

xmi : id="U937506ed-af 64-44c6-9b4c-e735bb6d8cc6" 
name="Activityl" visibility= "public "> 
<node xmi :type="uml : InitialNode" xmi : id="U16aal5e8-0e5d- 
4fdl-930a-725073ece9f0"> 
<outgoing xmi : idref ="Ue9366b93-a45b-43f l-a201-2038b0bd0b30"/> 
</node> 

<node xmi :type="uml:ForkNode" xmi : id="U26768518-a40c- 

4713-b35e-c267cc660508" name="ForkNode"> 
<incoming xmi : idref ="Ue9366b93-a45b-43fl-a201-2038b0bd0b30"/> 
<outgoing xmi : idref ="Ua800ba9b-el67-4a7c-a9a9-80e6a77edeb7"/> 

</node> 

<node xmi :type="uml :DecisionNode" xmi : id="Uc9e4f 0de-8da6- 
4c98-9b95-b4cde30ccf cO" name="DecisionNode"> 
<incoming xmi : idref ="Ua800ba9b-el67-4a7c-a9a9-80e6a77edeb7"/> 
<outgoing xmi : idref ="Ua4a2b313-13d6-4d69-9617-4803560731ef"/> 
<outgoing xmi : idref ="U6eede33f-98ac-4654-bbl7-dbe6aa7e46be"/> 

</node> 

<node xmi :type="uml : JoinNode" xmi : id="Ud304ce3c-ebe4- 

4b06-b75a-f a2321f 8al51 " name=" JoinNode " > 
<incoming xmi : idref ="Ua4a2b313-13d6-4d69-9617-4803560731ef "/> 
<incoming xmi : idref ="U6eede33f-98ac-4654-bbl7-dbe6aa7e46be"/> 

</node> 

<edge xmi : type="uml : ControlFlow" 

xmi:id="Ua4a2b313-13d6-4d69-9617-4803560731ef " 
source="Uc9e4f 0de-8da6-4c98-9b95-b4cde30ccf cO" 
target="Ud304ce3c-ebe4-4b06-b75a-fa2321f8al51"> 
<guard xmi : type="uml : LiteralString" 

xmi : id="U6872f 3b3-680c-430e-bdb3-21c0a317d290" 
visibility="public" value="x>10"/> 

</edge> 

<edge xmi : type="uml : ControlFlow" 

xmi : id="U6eede33f-98ac-4654-bbl7-dbe6aa7e46be" 
source="Uc9e4f 0de-8da6-4c98-9b95-b4cde30ccf cO" 
target="Ud304ce3c-ebe4-4b06-b75a-fa2321f8al51"> 
<guard xmi : type="uml : LiteralString" 

xmi : id="Ub853080d-481c-46f f -9f 7c-92a31ac24349" 
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visibility^ "public" value="else"/> 

</edge> 

<edge xmi : type="uml : ControlFlow" 

xmi : id="Ua800ba9b-el67-4a7c-a9a9-80e6a77edeb7" 
source="U26768518-a40c-4713-b35e-c267cc660508" 
target="Uc9e4f 0de-8da6-4c98-9b95-b4cde30ccf c0"/> 

<edge 

xmi : type="uml : ControlFlow" 

xmi : id="Ue9366b93-a45b-43f l-a201-2038b0bd0b30" 
source="U16aal5e8-0e5d-4fdl-930a-725073ece9f0" 
target="U26768518-a40c-4713-b35e-c267cc660508'7> 
</packagedElement> 

It is easy to detect that the sequence of applied productions 
11 ...nD* f D* iD* oD* nD* ... nD*jD*iD*i..." is not accepted by L(G a ) (one o 
follows /, while two i follow j), thus there is an inconsistency. 

We remark here that there are two preconditions of using the controlling 
grammar concerning the sequences of the model elements in the XML document: 
1. ForkNode must appear before its pairwise JoinNode; 2. incoming edges 
must appear before outcoming edges in a node. The two conditions are trivial, 
since it is easy to control their positions in the exporting XMI documents in 
implementing such a transformation. 

5 Related Work 

The most popular technique for verifying software correctness is model checking 
|23j . In this framework, we have three steps in verifying a system. First, we 
formalize system behavior as a model (e.g., a transition system, a Kripke model 
|24j ) . Second, we specify the properties that we aim at validating using temporal 
logics. Third, we use a certain checking algorithm to search for a counterexample 
which is an execution trace violating the specified properties. If the algorithm 
finds such a counterexample, we have to correct the original design. 

Most checking tools use specific semantics of UML diagrams. They have the 
flavor of model checking, e.g., Egyed's UML/ Analyzer [25] [26] and OCL (Object 
Constraint Language) [27]. At first, developers design UML diagrams as a model. 
Then, we specify the consistency rules as OCL or similar expressions. Certain 
algorithms are executed to detect counterexamples that violate the rules [25] . 
Note that these techniques do not discriminate the rules on the model level and 
those concerning the language level features. 

Unlike these techniques, our framework takes another way of ensuring cor- 
rectness. It consists of the following steps: 

1. Specifying the grammar G of a language. It specifies an operational seman- 
tics, which defines what a language is able to model. Developing the grammar 
is mainly performed by language designers. 
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2. Modeling correctness rules of the use of languages as a controlling grammar 
G. It specifies a correctness semantics, which defines what a language is 
authorized to derive. This process is the duty of safety engineers whose 
responsibility is to assure the correct use of the language. 

3. The two grammars constitute a consistent language as a whole, that is, any 
derivations of the global system language is a correct and consistent use of 
the language. 

In particular, our work differs from model checking in the following aspects: 
1. Our work has different objectives, and uses different approaches to those of 
model checking. As we show in Fig. [5] model checking techniques use a bottom- 
up approach — they verify execution traces T* at the lower level L\ to prove 
the correct use of the grammar G at the middle level . Whereas our proposal 
uses a top-down approach — we model correctness rules as acceptable se- 
quences of productions (P*) at the higher level L3 to ensure the correct use of 
G. Then any derivatives (at Li) that conform to the C-System C = G ■ G are 
definitely a correct use. So the two techniques are complementary. 



Level L 3 
(rules of use) 



Level L2 
(modeling language) 

T 

™, I Level Li 

w I (models, codes) 

Fig. 5. Three Levels of the Framework 
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2. Our work and model checking express language-level and model- level con- 
straints, respectively. Language-level constraints are more effective, because they 
implicitly have reusability. That is, we only need to develop one language-level 
constraint and apply it to all the models in the language. However, using model 
checking, we need to replicate model-level constraints for each model. Addition- 
ally, model checking can process model-specific constraints. 

3. Our work and model checking use syntax-based and semantics-based ap- 
proaches (or static and dynamic analysis), respectively. As a result, our approach 
is generic and metamodel-independent, and concerns little about semantics. So 
it can be applied to all MOF-compliant languages, not only to UML. However, 
model checking techniques depends on the semantics of a language, thus specific 
algorithms should be developed for different models. 

4. Our work and model checking catch errors at compile-time and run- 
time, respectively. As a result, our approach implements membership checking 
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of context-free languages, which is decidable. That is, it searches in a limited 
space, which is defined by grammars. Model checking may search in a larger, 
even infinite space, so we have to limit the space of computing, and introduce 
the risk of missing solutions. 

6 Discussion 

In this section, we would like to shortly discuss some issues which are beyond 
the scope of this paper. 

The first issue concerns the implementation of controlling grammars. The 
controlled and controlling grammars can be implemented using two parsers sep- 
arately. The technique for constructing a parser from a context-free grammar is 
rather mature [29 j 30J . Some tools provide automated generation of parsers from 
a grammar specification, such as Yacc, Bison. 

Notice that the inputs of controlling parsers are the sequences of productions 
applied in the parsing of L(G). So there are communications between the two 
parsers. Once module G uses a production pi, then the name of the production 
is sent to G as an input. If L(G) accepts the sequence of productions and L(G) 
accepts the model, then L{G~^G) accepts the model. 

The second issue deals with multiple rules. If we have multiple guidelines or 
consistency rules, each rule is formalized using a grammar. We can develop an 
automated tool that converts the grammars into automata, and then combine 
these automata to compute an intersection, i.e., an automaton A' |17j . The inter- 
section A' can be used as a controlling automaton, which specifies a controlling 
language L(A') that includes all the semantics of the rules. 

The third issue is about the tradeoff between cost and benefits of applying the 
proposed approach. It seems that writing a controlling grammar is expensive, 
because it involves formal methods. However, it is probably not the case. As 
we mentioned, a controlling grammar specify language-level constraints, and 
can be reused by all the models derived from the controlled grammar. Thus 
the controlling grammar can be identified and formalized by the organizations 
who define the language or its authorized usage, e.g., OMG and FAA (Federal 
Aviation Administration), respectively. Developers and software companies can 
use the published standard controlling grammar for checking inconsistencies in 
their models. By contraries, if every user writes their own checking algorithms 
and codes, e.g., in OCL or other programming languages, the codes will be hard 
to be reused by other users who have different models to check. Thus the total 
cost of all the users may be higher. Of course, more empirical results on the 
tradeoff is a good direction for future work. 

7 Conclusion 

We provided a language-theoretic view on guidelines and consistency rules of 
UML. We proposed the formalism of C-Systems, short for "formal language 
control systems" . To the best of our knowledge, none related work proposed 
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similar methodologies. Rules are considered as controlling grammars which con- 
trol the use of modeling languages. This methodology is generic, syntax-based 
and metamodel-independent. It provides a top-down approach that checks and 
reports violations of language level constraints at compilc-time. It can be also 
applied to all MOF-compliant languages, not only to UML, since it does not 
depend on the specific semantics of languages. 

Since we focused on the methodological foundation, one of the future work 
is to develop an automated checking tool implementing the presented principles. 
We will also examine instant checking techniques of our method. One feature of 
UML/ Analyzer is instant checking, which only verifies the small portion where 
the model changes, in order to save the cost of checking [31]. Intuitively, our 
approach is also easy to be extended to instant checking. We only need to gen- 
erate the XMI document of the changed part of diagrams (e.g. a class in a class 
diagram), and verify it. However, this calls for more works in detail. 
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